Video Mask Transfiner for High-Quality Video Instance Segmentation
نویسندگان
چکیده
While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details. Moreover, the predicted segmentations often fluctuate over time, suggesting that temporal consistency cues are neglected or not fully utilized. In this paper, we set out tackle these issues, aim of achieving highly detailed and more temporally stable mask predictions for VIS. We first propose Mask Transfiner (VMT) method, capable leveraging fine-grained high-resolution features thanks a efficient video transformer structure. Our VMT detects groups sparse error-prone spatio-temporal regions each tracklet in segment, which then refined using both local instance-level cues. Second, identify coarse annotations popular YouTube-VIS dataset constitute major limiting factor. Based on our architecture, therefore design an automated annotation refinement approach by iterative training self-correction. To benchmark VIS, introduce HQ-YTVIS dataset, consisting manually re-annotated test automatically data. compare most recent state-of-the-art methods HQ-YTVIS, as well Youtube-VIS, OVIS BDD100K MOTS benchmarks. Experimental results clearly demonstrate efficacy effectiveness method segmenting complex dynamic objects, capturing precise
منابع مشابه
MaskRNN: Instance Level Video Object Segmentation
Instance level video object segmentation is an important technique for video editing and compression. To capture the temporal coherence, in this paper, we develop MaskRNN, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance — a binary segmentation net providing a mask and a localization net providing a bounding box. Due to the recurrent...
متن کاملA mask matching approach for video segmentation on compressed data
Video segmentation provides an easy and ecient way for video retrieval and browsing. A frame is detected as a shot change frame if its content is very dierent from its previous frames. The process of segmenting videos into shots is usually time consuming due to the large number of frames in the videos. In this paper, we propose a new approach for segmenting videos into shots on MPEG coded vid...
متن کاملA Machine Learning Approach to No-Reference Objective Video Quality Assessment for High Definition Resources
The video quality assessment must be adapted to the human visual system, which is why researchers have performed subjective viewing experiments in order to obtain the conditions of encoding of video systems to provide the best quality to the user. The objective of this study is to assess the video quality using image features extraction without using reference video. RMSE values and processing ...
متن کاملa novel unsupervised approach for minimally-invasive video segmentation
background: laparoscopy or minimally invasive surgery is a surgical procedure in which laparoscope and other surgical instruments are inserted inside body via a few small incisions. laparoscope is used to look inside the patient's body and records displayed images. temporal segmentation of laparoscopic videos has many applications like detecting laparoscopic anomalies and interrupts. it is prer...
متن کاملHigh Quality Video Acquisition and Segmentation Using Alternate Flashing System
A high quality video acquisition algorithm is proposed in this work. We construct a flashing system to capture lit and unlit frames alternately. We develop a reliable motion estimation scheme, which matches correspondences between an unlit frame and a lit frame. Then, we construct a high quality frame, which combines natural scene mood in the unlit frame and textural details in the lit frame. F...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-19815-1_42